AITopics | topic 0

Collaborating Authors

topic 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

When Detection Fails: The Power of Fine-Tuned Models to Generate Human-Like Social Media Text

Dawkins, Hillary, Fraser, Kathleen C., Kiritchenko, Svetlana

arXiv.org Artificial IntelligenceJun-17-2025

Detecting AI-generated text is a difficult problem to begin with; detecting AI-generated text on social media is made even more difficult due to the short text length and informal, idiosyncratic language of the internet. It is nonetheless important to tackle this problem, as social media represents a significant attack vector in online influence campaigns, which may be bolstered through the use of mass-produced AI-generated posts supporting (or opposing) particular policies, decisions, or events. We approach this problem with the mindset and resources of a reasonably sophisticated threat actor, and create a dataset of 505,159 AI-generated social media posts from a combination of open-source, closed-source, and fine-tuned LLMs, covering 11 different controversial topics. We show that while the posts can be detected under typical research assumptions about knowledge of and access to the generating models, under the more realistic assumption that an attacker will not release their fine-tuned model to the public, detectability drops dramatically. This result is confirmed with a human study. Ablation experiments highlight the vulnerability of various detection algorithms to fine-tuned LLMs. This result has implications across all detection domains, since fine-tuning is a generally applicable and realistic LLM use case.

detector, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2506.09975

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(7 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Multivariate Gaussian Topic Modelling: A novel approach to discover topics with greater semantic coherence

Sahoo, Satyajeet, Maiti, Jhareswar, Tewari, Virendra Kumar

arXiv.org Artificial IntelligenceMar-19-2025

An important aspect of text mining involves information retrieval in form of discovery of semantic themes (topics) from documents using topic modelling. While generative topic models like Latent Dirichlet Allocation (LDA) elegantly model topics as probability distributions and are useful in identifying latent topics from large document corpora with minimal supervision, they suffer from difficulty in topic interpretability and reduced performance in shorter texts. Here we propose a novel Multivariate Gaussian Topic modelling (MGD) approach. In this approach topics are presented as Multivariate Gaussian Distributions and documents as Gaussian Mixture Models. Using EM algorithm, the various constituent Multivariate Gaussian Distributions and their corresponding parameters are identified. Analysis of the parameters helps identify the keywords having the highest variance and mean contributions to the topic, and from these key-words topic annotations are carried out. This approach is first applied on a synthetic dataset to demonstrate the interpretability benefits vis-\`a-vis LDA. A real-world application of this topic model is demonstrated in analysis of risks and hazards at a petrochemical plant by applying the model on safety incident reports to identify the major latent hazards plaguing the plant. This model achieves a higher mean topic coherence of 0.436 vis-\`a-vis 0.294 for LDA.

artificial intelligence, natural language, topic model, (16 more...)

arXiv.org Artificial Intelligence

2503.15036

Country:

Asia > India > West Bengal > Kharagpur (0.05)
Oceania > Australia (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Promising Solution (0.40)
Overview > Innovation (0.40)

Industry:

Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.69)
Government (0.67)
Leisure & Entertainment > Sports > Cricket (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.92)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)

Add feedback

Towards Conditioning Clinical Text Generation for User Control

Koraş, Osman Alperen, Bahnan, Rabi, Kleesiek, Jens, Dada, Amin

arXiv.org Artificial IntelligenceFeb-24-2025

Deploying natural language generation systems in clinical settings remains challenging despite advances in Large Language Models (LLMs), which continue to exhibit hallucinations and factual inconsistencies, necessitating human oversight. This paper explores automated dataset augmentation using LLMs as human proxies to condition LLMs for clinician control without increasing cognitive workload. On the BioNLP ACL'24 Discharge Me! Shared Task, we achieve new state-of-the-art results with simpler methods than prior submissions through more efficient training, yielding a 9\% relative improvement without augmented training and up to 34\% with dataset augmentation. Preliminary human evaluation further supports the effectiveness of our approach, highlighting the potential of augmenting clinical text generation for control to enhance relevance, accuracy, and factual consistency.

discharge summary, guideline, text generation, (14 more...)

arXiv.org Artificial Intelligence

2502.17571

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(10 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Health Care Providers & Services (0.94)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Health Care Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging Large Language Models and Topic Modeling for Toxicity Classification

Oskouie, Haniyeh Ehsani, Chance, Christina, Huang, Claire, Capetz, Margaret, Eyeson, Elizabeth, Sarrafzadeh, Majid

arXiv.org Artificial IntelligenceNov-26-2024

Content moderation and toxicity classification represent critical tasks with significant social implications. However, studies have shown that major classification models exhibit tendencies to magnify or reduce biases and potentially overlook or disadvantage certain marginalized groups within their classification processes. Researchers suggest that the positionality of annotators influences the gold standard labels in which the models learned from propagate annotators' bias. To further investigate the impact of annotator positionality, we delve into fine-tuning BERTweet and HateBERT on the dataset while using topic-modeling strategies for content moderation. The results indicate that fine-tuning the models on specific topics results in a notable improvement in the F1 score of the models when compared to the predictions generated by other prominent classification models such as GPT-4, PerspectiveAPI, and RewireAPI. These findings further reveal that the state-of-the-art large language models exhibit significant limitations in accurately detecting and interpreting text toxicity contrasted with earlier methodologies. Code is available at https://github.com/aheldis/Toxicity-Classification.git.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.17876

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.33)
Asia > Middle East > Jordan (0.05)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

$S^3$ -- Semantic Signal Separation

Kardos, Márton, Kostkan, Jan, Vermillet, Arnault-Quentin, Nielbo, Kristoffer, Enevoldsen, Kenneth, Rocca, Roberta

arXiv.org Machine LearningJun-18-2024

Topic models are useful tools for discovering latent semantic structures in large textual corpora. Topic modeling historically relied on bag-of-words representations of language. This approach makes models sensitive to the presence of stop words and noise, and does not utilize potentially useful contextual information. Recent efforts have been oriented at incorporating contextual neural representations in topic modeling and have been shown to outperform classical topic models. These approaches are, however, typically slow, volatile and still require preprocessing for optimal results. We present Semantic Signal Separation ($S^3$), a theory-driven topic modeling approach in neural embedding spaces. $S^3$ conceptualizes topics as independent axes of semantic space, and uncovers these with blind-source separation. Our approach provides the most diverse, highly coherent topics, requires no preprocessing, and is demonstrated to be the fastest contextually sensitive topic model to date. We offer an implementation of $S^3$, among other approaches, in the Turftopic Python package.

topic 0, topic 13, topic 15, (16 more...)

arXiv.org Machine Learning

2406.09556

Country:

Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.04)
North America > United States > Ohio (0.04)
Asia > Armenia (0.04)
(17 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Hockey (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.76)

Add feedback

Latent Dirichlet Allocation

#artificialintelligenceAug-1-2022, 16:45:45 GMT

Latent Dirichlet Allocation, or LDA for short, is an unsupervised machine learning algorithm. Similar to the clustering algorithm K-means, LDA will attempt to group words and documents into a predefined number of clusters (i.e. These topics can then be used to organize and search through documents. The most popular methods for estimating the LDA model is Gibbs sampling. Let's walk through one iteration of the algorithm.

latent dirichlet allocation, probability, topic 0, (11 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.71)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.55)

Add feedback

Topic Modeling with Scikit Learn – ML Review – Medium

@machinelearnbotDec-20-2017, 20:50:21 GMT

Latent Dirichlet Allocation (LDA) is a algorithms used to discover the topics that are present in a corpus. A few open source libraries exist, but if you are using Python then the main contender is Gensim. Gensim is an awesome library and scales really well to large text corpuses. Gensim, however does not include Non-negative Matrix Factorization (NMF), which can also be used to find topics in text. The mathematical basis underpinning NMF is quite different from LDA.

artificial intelligence, natural language, text processing, (20 more...)

@machinelearnbot

Industry: Leisure & Entertainment (0.30)

Technology:

Information Technology > Communications > Social Media (0.51)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.39)

Add feedback